SHINE-Mapping: Large-Scale 3D Mapping Using Sparse Hierarchical Implicit Neural Representations

Xingguang Zhong, Yue Pan, Jens Behley, Cyrill Stachniss

2023 IEEE International Conference on Robotics and Automation (ICRA)

10.1109/ICRA48891.2023.10160907

Introduction

Fist work aims construct maps based on point cloud as input
Low memory cost, high speed, high performance

Method

Construct coodinate world based on the first frame
Each Level has a Hash Table to store the feature vectors of node

Difficulty

Point Sampling
Incremental Mapping without forgetting

Training pairs and Loss Function

Property of Lidar Scene : Sparse, noisy, get the true SDF is very difficult.
- Directly obtain training pairs by sampling points along the ray and directly use the signed distance from sampled point to the beam endpoint as the signed distance of the point and the underlying surface
How to solve the accurancy problem?
For SDF-based mapping, the regions of interst are the values close to zero as they define the surfaces.
- Sampling point closer to the endpoint should have a higher impact as the precise SDF value far from a surface has a very little impact.
Insread of using $L_{2}$ Loss, Using BCE(Binary Cross Entropy) Loss, The SDF value should be processed by a sigmoid function
- $L_{b c e} = l_{i} \log (o_{i}) + (1 - l_{i}) \log (1 - o_{i})$
- $o_{i} = S i g m o i d (f_{θ} (x_{i})), l_{i} = S i g m o i d (d_{i})$
Sampling Policy :
- Sampling $N_{f}$ points in the fress space and $N_{s}$ points inside the truncation band $\pm 3 σ$ around the surface

Why Sigmoid

!loss_func_v2.png

Loss Function

Eikonal Loss

L_{e i k} = {(‖ \frac{\partial f_{θ} (x_{i})}{\partial x_{i}} ‖ - 1)}^{2}

Incremental Mapping Without Forgetting

!iidfinal.png

We can only obtain partial observations of the environment at each frame ( $T_{0}, T_{1}$ and $T_{2}$ ). During the incremental mapping, we use the data capture from Area $A_{0}$ to optimize feature $V_{0}, V_{1}, V_{2}, V_{3}$ , after the training converge, $V_{0}, V_{1}, V_{2}, V_{3}$ will have an accurate geometric representation of $A_{0}$ . However, if we move forward and use the data from frame $T_{1}$ to train and update $V_{0}, V_{1}, V_{2}, V_{3}$ , the network will only focus on reducing the loss generated in $A_{1}$ and does not care about the performance in $A_{0}$ anymore. This may lead to a decline in the reconstruction accuracy in $A_{0}$ . Same problem happens from $T_{1}$ to $T_{2}$

Regularization term

$L_{r} = \sum_{i \in A} Ω_{i} {(θ_{i}^{t} - θ_{i}^{*})}^{2}$

Update the importance weright as the sensitivity of the loss of previous data to a parameter change suggested in previous incremental learning research:

Ω_{i} = min (Ω_{i}^{*} + \sum_{k = 1}^{N} ‖ \frac{\partial L_{bce} (x_{k}, l_{k})}{\partial θ_{i}} ‖, Ω_{m})

experiment

!qualitative_mai_v3.png
A comparison of different methods on the MaiCity dataset. The first row shows the reconstructed mesh and a tree is highlighted in the black box. The second row shows the error map of the reconstruction overlaid on the ground truth mesh, where the blue to red colormap indicates the signed reconstruction error from -5 cm to +5 cm. (From left to right: Shine,(TSDF-based Method)Voxblox,(TSDF-based)VDF Fusion, (Possion-based Surface Reconstruction)Puma, Shine+DR(With Diffential Rendering method))

experiment on MaiCity dataset and Newer College dataset

Comparison of map memory efficiency

A comparison of the incremental mapping results with and without regularization